96 research outputs found
Post-Processing Independent Evaluation of Sound Event Detection Systems
Due to the high variation in the application requirements of sound event
detection (SED) systems, it is not sufficient to evaluate systems only in a
single operating mode. Therefore, the community recently adopted the polyphonic
sound detection score (PSDS) as an evaluation metric, which is the normalized
area under the PSD receiver operating characteristic (PSD-ROC). It summarizes
the system performance over a range of operating modes resulting from varying
the decision threshold that is used to translate the system output scores into
a binary detection output. Hence, it provides a more complete picture of the
overall system behavior and is less biased by specific threshold tuning.
However, besides the decision threshold there is also the post-processing that
can be changed to enter another operating mode. In this paper we propose the
post-processing independent PSDS (piPSDS) as a generalization of the PSDS.
Here, the post-processing independent PSD-ROC includes operating points from
varying post-processings with varying decision thresholds. Thus, it summarizes
even more operating modes of an SED system and allows for system comparison
without the need of implementing a post-processing and without a bias due to
different post-processings. While piPSDS can in principle combine different
types of post-processing, we hear, as a first step, present median filter
independent PSDS (miPSDS) results for this year's DCASE Challenge Task4a
systems. Source code is publicly available in our sed_scores_eval package
(https://github.com/fgnt/sed_scores_eval).Comment: submitted to DCASE Workshop 202
LibriWASN: A Data Set for Meeting Separation, Diarization, and Recognition with Asynchronous Recording Devices
We present LibriWASN, a data set whose design follows closely the LibriCSS
meeting recognition data set, with the marked difference that the data is
recorded with devices that are randomly positioned on a meeting table and whose
sampling clocks are not synchronized. Nine different devices, five smartphones
with a single recording channel and four microphone arrays, are used to record
a total of 29 channels. Other than that, the data set follows closely the
LibriCSS design: the same LibriSpeech sentences are played back from eight
loudspeakers arranged around a meeting table and the data is organized in
subsets with different percentages of speech overlap. LibriWASN is meant as a
test set for clock synchronization algorithms, meeting separation, diarization
and transcription systems on ad-hoc wireless acoustic sensor networks. Due to
its similarity to LibriCSS, meeting transcription systems developed for the
former can readily be tested on LibriWASN. The data set is recorded in two
different rooms and is complemented with ground-truth diarization information
of who speaks when.Comment: Accepted for presentation at the ITG conference on Speech
Communication 202
- β¦